Draft Genome Sequence of Trichomonas tenax Strain Hs-4:NIH

ABSTRACT Trichomonas tenax is a flagellated parasite that plays an important role in periodontal disease, with high prevalence worldwide. Its pathogenesis remains largely unknown, and there is very little information on its genome. Here, we present the whole-genome shotgun sequence of T. tenax strain Hs-4:NIH.

T richomonas tenax is usually found colonizing the oral cavity, tartar, and caries of humans and dogs. It is transmitted among hosts by direct contact (kissing), droplet spray, or contaminated food (1). Infection with T. tenax occurs globally with prevalence rates ranging from 4% to 55.9%, depending on geographical location and oral hygiene (2)(3)(4)(5)(6). Trichomonas protozoa are an ancient eukaryotic lineage without mitochondria; they have a hydrogenosome instead. The latter is a primitive organelle that is phylogenetically related to the fully developed mitochondrion (7,8). Here, we report a whole-genome shotgun sequence of T. tenax strain Hs-4:NIH, which has great value in medical applications for further studies on the pathogenesis, gene function, and transcriptional regulatory mechanisms of T. tenax.
T. tenax cells (ATCC 30207; ATCC, Manassas, VA, USA) were cultured in Diamond's medium as described previously (9)(10)(11)(12). Briefly, they were incubated at 35°C for 5 days in a roundbottomed Falcon 14-mL tube (Fisher Scientific, Ottawa, Ontario, Canada) filled with 13 mL Diamond's medium supplemented with 10% horse serum (Sigma-Aldrich, St. Louis, MO, USA) plus 100 U/mL ampicillin and 100 mg/mL streptomycin (Thermo Fisher Scientific, Waltham, MA, USA). Cell density was monitored daily using a hemacytometer (Hausser Scientific, Horsham, PA, USA). Cells at a cell density of 2 Â 10 6 to 5 Â 10 6 cells/mL were collected by centrifugation at 800 Â g at 4°C for 5 min, followed by three washes in phosphate-buffered saline (pH 7.2) with the same centrifugation. Total genomic DNA was extracted from the pelleted cells using the DNeasy blood and tissue kit (Qiagen, Hilden, Germany). It was shown previously that T. tenax has six haploid chromosomes, each with two parallel chromatids (13). The genome was sequenced 50Â in the current study at the Iowa Institute of Human Genetics, University of Iowa, using the NovaSeq 6000 sequencing system (Illumina, San Diego, CA, USA). All programs used in this whole-genome project were used with default settings unless otherwise noted. FastQC (version 0.11.5) was used for quality control assessment of reads. SPAdes (version 3.13.0) was applied for contig construction. Statistics are shown in Table 1. The genome of the closely related trichomonad protozoan Trichomonas vaginalis is highly repetitive, with repetitive sequences accounting for at least 65% of its genome, which was determined by comparing the frequency of observed k-mers to the distribution expected by a Poisson distribution for a repeatless genome (14). The 63.4 Mb of $1,000-bp contigs is a little less than one-half of the estimated genome size of 133 Mb for T. tenax, indicating a very repetitive genome like that of T. vaginalis.
The AUGUSTUS program (version 3.4.0) was used for gene prediction using the assembled contigs. A Docker container was created using the Dockerfile available on the AUGUSTUS University of Iowa high-performance computing cluster (https://github.com/Gaius-Augustus/ Augustus). The eukaryotic pipeline annotation predicted 4,429 protein-coding genes and 31 small GTPase Rab1 family members.
The InterPro program (versions 5.51 to 85.0) was used for functional prediction. Protein sequences were extracted from the AUGUSTUS output gff files (https://github.com/keenhl/ mra_paper). To satisfy the dependency of InterPro on Java (version 11), a special environment was created with Conda. Cysteine proteases are well known virulence factors in T. vaginalis (15). In this draft genome of T. tenax, we were able to identify more than 10 types of proteases, including, but not limited to cysteine protease family C1 related, papain cysteine protease (C1) family, ubiquitin-specific protease and calpain cysteine protease (C2) family, ubiquitin-like-conjugating enzyme ATG3, cysteine proteinase 3, cysteine synthase 1, cathepsin L-like cysteine peptidase, cysteine protease Rdl2 related, and cysteine desulfurylase family, highlighting the diversity of proteases.
In summary, a draft genome sequence of T. tenax was generated in the current study. This genome should lay a foundation for future studies on molecular pathogenesis and host-parasite interactions of this widespread trichomonad protozoan using genomics, proteomics, and transcriptomics. It also makes comparative genomic analyses possible between two highly prevalent human parasites, T. vaginalis and T. tenax.
Data availability. Genomic sequences have been deposited in the NCBI Genome Sequence (SUB10957852) and NCBI GenBank (BioProject number PRJNA797354) databases. This whole-genome shotgun project has been deposited at DDBJ/ENA/GenBank under the accession number JALKBU000000000. The version described in this paper is version JALKBU010000000.

ACKNOWLEDGMENTS
The work was partially supported by intramural grants (Trichomonas genome 41002-2021) of Ross University School of Veterinary Medicine. The publication cost was provided by the Associate Dean for Research and Postgraduate Studies of Ross University School of Veterinary Medicine.